In this Markdown we will look at a one example where we will go through much of the content of the course covered thus far, with particular emphasis on some tools for exploring potential predictor transformations in multiple regression.
Ozone is one of the nasty constituents of photochemical smog, and its concentration is the standard indicator of the severity of such smog. If its level is high, a smog alert is called. Ozone is not emitted directly into the atmosphere. Rather, it is a product of chemical reactions that require solar radiation and emissions of primary pollutants from smoke stacks and automobile exhaust. When the ventilation of the atmosphere is low, the chemical reactions bring ozone to high levels. Low ventilation occurs when wind speeds are low and temperature is high because on hot, calm days in the summer the atmosphere cannot cleanse itself. The goal in the analysis of these data is to determine how ozone depends on the other variables.
A brief description of the variables and their abbreviations is given below:
We begin our analysis by examining a scatterplot matrix of the data. It appears that upper ozone concentration is related to several of the predictors, although not linearly. We also see that several of the variables are correlated with one another.
First read the data in from OzoneData.csv and load the car
library.
require(car)
## Loading required package: car
Ozone = read.csv(file="http://course1.winona.edu/bdeppa/Regression/Data/Ozone.csv")
names(Ozone)
## [1] "upoz" "day" "v500" "wind" "hum" "safb" "inbh" "dagg" "inbt" "vis"
head(Ozone)
## upoz day v500 wind hum safb inbh dagg inbt vis
## 1 3 3 5710 4 28 40 2693 -25 87 250
## 2 5 4 5700 3 37 45 590 -24 128 100
## 3 5 5 5760 3 51 54 1450 25 139 60
## 4 6 6 5720 4 69 35 1568 15 121 60
## 5 4 7 5790 6 19 45 2631 -33 123 100
## 6 4 8 5790 3 25 55 554 -28 182 250
summary(Ozone)
## upoz day v500 wind
## Min. : 1.00 Min. : 3.00 Min. :5320 Min. : 0.000
## 1st Qu.: 5.00 1st Qu.: 90.25 1st Qu.:5690 1st Qu.: 3.000
## Median :10.00 Median :177.50 Median :5760 Median : 5.000
## Mean :11.78 Mean :181.73 Mean :5750 Mean : 4.891
## 3rd Qu.:17.00 3rd Qu.:275.75 3rd Qu.:5830 3rd Qu.: 6.000
## Max. :38.00 Max. :365.00 Max. :5950 Max. :21.000
## hum safb inbh dagg
## Min. :19.00 Min. :25.00 Min. : 111.0 Min. :-69.00
## 1st Qu.:47.00 1st Qu.:51.00 1st Qu.: 877.5 1st Qu.: -9.00
## Median :64.00 Median :62.00 Median :2112.5 Median : 24.00
## Mean :58.13 Mean :61.75 Mean :2572.9 Mean : 17.37
## 3rd Qu.:73.00 3rd Qu.:72.00 3rd Qu.:5000.0 3rd Qu.: 44.75
## Max. :93.00 Max. :93.00 Max. :5000.0 Max. :107.00
## inbt vis
## Min. :-25.0 Min. : 0.0
## 1st Qu.:107.0 1st Qu.: 70.0
## Median :167.5 Median :120.0
## Mean :161.2 Mean :124.5
## 3rd Qu.:214.0 3rd Qu.:150.0
## Max. :332.0 Max. :350.0
The pairs.plus
command the Regression.RData
directory that sent you at the beginning of the course will create enhanced scatterplot matrix.
pairs.plus(Ozone)
We begin by fitting a baseline model using \(Y = Ozone\) as the response and all of the predictors in the original form as the terms in the model. We will examine a model summary of this model and look at a set of four diagnostic plots for assessing the adequacy of the model using the plot(model)
command. Note that as there are four plots in this set I have used the command par(mfrow=c(2,2))
to set up a plotting multiple figure region with 2 rows and 2 columns of plots.
lm.oz1 = lm(upoz~.,data=Ozone)
summary(lm.oz1)
##
## Call:
## lm(formula = upoz ~ ., data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.2407 -2.8832 -0.3353 2.7409 13.3523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.3792938 29.5045242 0.623 0.53377
## day -0.0088490 0.0027199 -3.253 0.00126 **
## v500 -0.0051340 0.0053950 -0.952 0.34200
## wind -0.0198304 0.1238829 -0.160 0.87292
## hum 0.0804923 0.0188345 4.274 2.54e-05 ***
## safb 0.2743349 0.0497361 5.516 7.17e-08 ***
## inbh -0.0002497 0.0002950 -0.846 0.39798
## dagg -0.0036968 0.0112925 -0.327 0.74360
## inbt 0.0292640 0.0136115 2.150 0.03231 *
## vis -0.0080742 0.0037565 -2.149 0.03235 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.441 on 320 degrees of freedom
## Multiple R-squared: 0.7011, Adjusted R-squared: 0.6927
## F-statistic: 83.4 on 9 and 320 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm.oz1)
par(mfrow=c(1,1))
The residual plot suggest both curvature (i.e. the model is inadequate) and the variation appears to be constant. We can formally test these using Tukey’s test for nonadditivity and the Score test. The car
library contains functions for performing both of these tests. The command residualPlots
constructs a series of residual plots: \((\hat{e_i}\ vs. each\ term)\) and \(\hat{e_i}\ vs.\ \hat{y_i}\). If you supply the tests=T
option to this command it will tests the significance of adding a squared term for each predictor and conducts overall Tukey test using the fitted values squared \(\hat{y_i}^2\) as term to the model. The function ncvTest(model)
we conduct the overall score test again using the fitted values \(\hat{y_i}\) to model the squared residuals or a custom model using specified terms to model the variance.
# Manually conducting the Tukey Test for Nonadditivity
yhat2 = fitted(lm.oz1)^2
lm.tukey = lm(upoz~.+yhat2,data=Ozone)
summary(lm.tukey)
##
## Call:
## lm(formula = upoz ~ . + yhat2, data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.124 -2.307 -0.094 2.454 12.713
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.916e+01 2.753e+01 -1.059 0.290
## day -1.865e-03 2.625e-03 -0.711 0.478
## v500 6.296e-03 5.116e-03 1.231 0.219
## wind -1.038e-01 1.134e-01 -0.916 0.361
## hum 1.569e-02 1.894e-02 0.829 0.408
## safb -1.318e-02 5.753e-02 -0.229 0.819
## inbh -3.147e-04 2.691e-04 -1.169 0.243
## dagg 5.332e-03 1.036e-02 0.515 0.607
## inbt -6.227e-03 1.316e-02 -0.473 0.636
## vis -1.850e-04 3.560e-03 -0.052 0.959
## yhat2 3.936e-02 4.846e-03 8.122 1.02e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.049 on 319 degrees of freedom
## Multiple R-squared: 0.7523, Adjusted R-squared: 0.7446
## F-statistic: 96.9 on 10 and 319 DF, p-value: < 2.2e-16
residualPlots(lm.oz1,tests=T)
## Test stat Pr(>|t|)
## day -5.899 0.000
## v500 3.861 0.000
## wind 0.581 0.561
## hum -1.476 0.141
## safb 5.762 0.000
## inbh 0.602 0.548
## dagg -4.352 0.000
## inbt 5.108 0.000
## vis 1.980 0.049
## Tukey test 8.122 0.000
# NCV plot along with test using ncvTest from car library
ncv.plot(lm.oz1)
# Overall Score Test
ncvTest(lm.oz1)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 35.04723 Df = 1 p = 3.218045e-09
# Custom non-constant variance test using specified terms
ncvTest(lm.oz1,~v500+hum+dagg+inbt,data=Ozone)
## Non-constant Variance Score Test
## Variance formula: ~ v500 + hum + dagg + inbt
## Chisquare = 45.29404 Df = 4 p = 3.453833e-09
Despite the fact some the predictors appear to have mildly nonlinear relationships between them, we will examine an inverse fitted value plot \((\hat{y_i}\ vs. y_i)\) using the invResPlot
command from the car
library. The \(\hat{\lambda}\) value reported is the “optimal” transformation for fitting the relationship exhibited in the inverse fitted value plot.
invResPlot(lm.oz1)
## lambda RSS
## 1 0.437994 3976.284
## 2 -1.000000 8034.225
## 3 0.000000 4334.323
## 4 1.000000 4424.844
The optimal power transformation from the inverse fitted value plot is \(\hat{\lambda}=0.44\), which would lead us to consider possibly a square root or cube root transformation.
The function BCtran
in my Regression.RData
will conduct the Box-Cox transformation procedure for the response. Also the function powerTransform
in the car
library will do the same. Recall this procedure is finding a power transform for the response \(Y\) which makes it the most normally distributed in the transformed scale. It generally will not agree with the power transform suggested by the inverse fitted value plot.
BCtran(Ozone$upoz)
summary(powerTransform(Ozone$upoz))
## bcPower Transformation to Normality
## Est Power Rounded Pwr Wald Lwr bnd Wald Upr Bnd
## Ozone$upoz 0.1785 0.18 0.0331 0.3239
##
## Likelihood ratio tests about transformation parameters
## LRT df pval
## LR test, lambda = (0) 5.922072 1 0.01495236
## LR test, lambda = (1) 110.044096 1 0.00000000
We will now compare a few models with the response transformed \(T(Y)=\sqrt{Y}\), \(T(Y)=Y^{0.25}\), \(T(Y)=Y^{0.33}\), and \(T(Y)=log(Y)\).
oz.sr = lm(upoz^0.5~.,data=Ozone)
oz.fr = lm(upoz^0.25~.,data=Ozone)
oz.cr = lm(upoz^0.333~.,data=Ozone)
oz.log = lm(log(upoz)~.,data=Ozone)
par(mfrow=c(2,2))
plot(oz.sr)
plot(oz.fr)
plot(oz.cr)
plot(oz.log)
par(mfrow=c(1,1))
Of these four potential models the cube root transformation of the response appears to be the best. However all of them could potentially be used. We will now examine residual plots for the cube root response model, test for curvature, and test for nonconstant variance.
residualPlots(oz.cr)
## Test stat Pr(>|t|)
## day -7.778 0.000
## v500 1.483 0.139
## wind 0.447 0.656
## hum -3.010 0.003
## safb 2.131 0.034
## inbh -1.315 0.189
## dagg -5.430 0.000
## inbt 1.612 0.108
## vis 2.093 0.037
## Tukey test 3.636 0.000
ncvTest(oz.cr)
## Non-constant Variance Score Test
## Variance formula: ~ fitted.values
## Chisquare = 0.08783124 Df = 1 p = 0.7669526
Several of the tests for curvature suggest our mean function still needs some work, while the nonconstant variance no longer is an issue. We will now use tools for identifying predictor transformations to address the curvature issue.
Before using tools to transform the predictors we will use stepwise methods (using the command step
) to drop some of the predictors that do not appear to be important via backward elimination. However, we could also consider seeking transformations first and reduce the model.
oz.step = step(oz.cr)
## Start: AIC=-850.59
## upoz^0.333 ~ day + v500 + wind + hum + safb + inbh + dagg + inbt +
## vis
##
## Df Sum of Sq RSS AIC
## - v500 1 0.00765 23.600 -852.49
## - wind 1 0.00790 23.601 -852.48
## - dagg 1 0.00829 23.601 -852.48
## - inbt 1 0.08898 23.682 -851.35
## <none> 23.593 -850.59
## - vis 1 0.45222 24.045 -846.33
## - inbh 1 0.47372 24.066 -846.03
## - hum 1 1.00407 24.597 -838.84
## - day 1 1.07978 24.672 -837.82
## - safb 1 3.09264 26.685 -811.94
##
## Step: AIC=-852.49
## upoz^0.333 ~ day + wind + hum + safb + inbh + dagg + inbt + vis
##
## Df Sum of Sq RSS AIC
## - wind 1 0.0053 23.606 -854.41
## - dagg 1 0.0093 23.610 -854.36
## - inbt 1 0.0838 23.684 -853.32
## <none> 23.600 -852.49
## - vis 1 0.4462 24.047 -848.30
## - inbh 1 0.5600 24.160 -846.75
## - hum 1 1.0414 24.642 -840.24
## - day 1 1.1086 24.709 -839.34
## - safb 1 3.1665 26.767 -812.94
##
## Step: AIC=-854.41
## upoz^0.333 ~ day + hum + safb + inbh + dagg + inbt + vis
##
## Df Sum of Sq RSS AIC
## - dagg 1 0.0078 23.613 -856.30
## - inbt 1 0.0824 23.688 -855.26
## <none> 23.606 -854.41
## - vis 1 0.4671 24.073 -849.94
## - inbh 1 0.5984 24.204 -848.15
## - hum 1 1.0427 24.648 -842.15
## - day 1 1.1383 24.744 -840.87
## - safb 1 3.1625 26.768 -814.92
##
## Step: AIC=-856.3
## upoz^0.333 ~ day + hum + safb + inbh + inbt + vis
##
## Df Sum of Sq RSS AIC
## - inbt 1 0.0774 23.691 -857.22
## <none> 23.613 -856.30
## - vis 1 0.4639 24.077 -851.88
## - inbh 1 0.6431 24.256 -849.43
## - day 1 1.2160 24.829 -841.73
## - hum 1 1.7328 25.346 -834.93
## - safb 1 4.4251 28.039 -801.62
##
## Step: AIC=-857.22
## upoz^0.333 ~ day + hum + safb + inbh + vis
##
## Df Sum of Sq RSS AIC
## <none> 23.691 -857.22
## - vis 1 0.5079 24.199 -852.22
## - day 1 1.1388 24.830 -843.73
## - hum 1 1.7220 25.413 -836.07
## - inbh 1 2.6694 26.360 -823.99
## - safb 1 20.4115 44.102 -654.15
summary(oz.step)
##
## Call:
## lm(formula = upoz^0.333 ~ day + hum + safb + inbh + vis, data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.71266 -0.17297 0.01515 0.19058 0.72140
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.873e-01 1.087e-01 8.165 7.26e-15 ***
## day -5.981e-04 1.515e-04 -3.946 9.73e-05 ***
## hum 4.093e-03 8.434e-04 4.853 1.89e-06 ***
## safb 2.221e-02 1.330e-03 16.708 < 2e-16 ***
## inbh -6.261e-05 1.036e-05 -6.042 4.18e-09 ***
## vis -5.929e-04 2.250e-04 -2.635 0.00881 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2704 on 324 degrees of freedom
## Multiple R-squared: 0.7331, Adjusted R-squared: 0.7289
## F-statistic: 178 on 5 and 324 DF, p-value: < 2.2e-16
anova(oz.step,oz.cr)
## Analysis of Variance Table
##
## Model 1: upoz^0.333 ~ day + hum + safb + inbh + vis
## Model 2: upoz^0.333 ~ day + v500 + wind + hum + safb + inbh + dagg + inbt +
## vis
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 324 23.691
## 2 320 23.593 4 0.098155 0.3328 0.8558
residualPlots(oz.step)
## Test stat Pr(>|t|)
## day -6.002 0.000
## hum -2.397 0.017
## safb 2.186 0.030
## inbh -1.273 0.204
## vis 2.087 0.038
## Tukey test 3.470 0.001
The Big F-test supports the removal of the terms dropped via backward elimination \((p=.8558)\). The residual plots and tests for curvature for the reduced model suggest curvature is an issue for the reduced model.
We will now use C+R plots and CERES plots to help us find terms to address the curvature using only the predictors remaining after backward elimination.
crPlots(oz.step)
ceresPlots(oz.step)
We will fit an overly ambitious polynomial model based upon some of the curvature exhibited in the CERES plots.
oz.poly = lm(upoz^.333~poly(day,3)+poly(hum,3)+poly(safb,2)+poly(inbh,3)+poly(vis,2),data=Ozone)
summary(oz.poly)
##
## Call:
## lm(formula = upoz^0.333 ~ poly(day, 3) + poly(hum, 3) + poly(safb,
## 2) + poly(inbh, 3) + poly(vis, 2), data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.78773 -0.14896 0.00627 0.16219 0.65946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.15345 0.01364 157.858 < 2e-16 ***
## poly(day, 3)1 -0.84652 0.27964 -3.027 0.002672 **
## poly(day, 3)2 -2.04929 0.41292 -4.963 1.14e-06 ***
## poly(day, 3)3 0.52708 0.27846 1.893 0.059289 .
## poly(hum, 3)1 0.33795 0.33665 1.004 0.316209
## poly(hum, 3)2 -0.42249 0.26847 -1.574 0.116561
## poly(hum, 3)3 -0.01096 0.25632 -0.043 0.965911
## poly(safb, 2)1 4.72119 0.45523 10.371 < 2e-16 ***
## poly(safb, 2)2 0.32885 0.26052 1.262 0.207783
## poly(inbh, 3)1 -2.22061 0.33321 -6.664 1.18e-10 ***
## poly(inbh, 3)2 -0.54278 0.26977 -2.012 0.045067 *
## poly(inbh, 3)3 0.65783 0.25604 2.569 0.010652 *
## poly(vis, 2)1 -1.22464 0.30574 -4.005 7.72e-05 ***
## poly(vis, 2)2 0.95216 0.26349 3.614 0.000351 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2478 on 316 degrees of freedom
## Multiple R-squared: 0.7813, Adjusted R-squared: 0.7723
## F-statistic: 86.86 on 13 and 316 DF, p-value: < 2.2e-16
The squared terms for SAFB is not significant so we will drop it.
oz.poly2 = lm(upoz^.333~poly(day,3)+poly(hum,3)+safb+poly(inbh,3)+poly(vis,2),data=Ozone)
summary(oz.poly2)
##
## Call:
## lm(formula = upoz^0.333 ~ poly(day, 3) + poly(hum, 3) + safb +
## poly(inbh, 3) + poly(vis, 2), data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.80618 -0.15861 0.00941 0.17245 0.64571
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.046626 0.108091 9.683 < 2e-16 ***
## poly(day, 3)1 -0.882526 0.278445 -3.169 0.001676 **
## poly(day, 3)2 -2.100770 0.411286 -5.108 5.64e-07 ***
## poly(day, 3)3 0.517840 0.278620 1.859 0.064013 .
## poly(hum, 3)1 0.358555 0.336564 1.065 0.287533
## poly(hum, 3)2 -0.447914 0.267965 -1.672 0.095603 .
## poly(hum, 3)3 -0.009907 0.256562 -0.039 0.969223
## safb 0.017923 0.001736 10.322 < 2e-16 ***
## poly(inbh, 3)1 -2.205219 0.333294 -6.616 1.57e-10 ***
## poly(inbh, 3)2 -0.488572 0.266579 -1.833 0.067778 .
## poly(inbh, 3)3 0.702464 0.253828 2.767 0.005981 **
## poly(vis, 2)1 -1.236083 0.305896 -4.041 6.69e-05 ***
## poly(vis, 2)2 0.977910 0.262944 3.719 0.000236 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.248 on 317 degrees of freedom
## Multiple R-squared: 0.7802, Adjusted R-squared: 0.7719
## F-statistic: 93.79 on 12 and 317 DF, p-value: < 2.2e-16
Only the squared humidity terms is significant so we might consider simply squaring humidity and using that single term in our model, i.e. \(U_j=X_7^2=(humidity)^2\).
hum2 = Ozone$hum^2
oz.poly3 = lm(upoz^.333~poly(day,3)+hum2+safb+poly(inbh,3)+poly(vis,2),data=Ozone)
summary(oz.poly3)
##
## Call:
## lm(formula = upoz^0.333 ~ poly(day, 3) + hum2 + safb + poly(inbh,
## 3) + poly(vis, 2), data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81164 -0.15697 0.01347 0.16724 0.66436
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.079e+00 1.149e-01 9.388 < 2e-16 ***
## poly(day, 3)1 -9.111e-01 2.782e-01 -3.275 0.001171 **
## poly(day, 3)2 -2.349e+00 3.891e-01 -6.039 4.32e-09 ***
## poly(day, 3)3 5.166e-01 2.789e-01 1.852 0.064923 .
## hum2 4.848e-06 8.933e-06 0.543 0.587722
## safb 1.711e-02 1.677e-03 10.204 < 2e-16 ***
## poly(inbh, 3)1 -2.325e+00 3.265e-01 -7.122 7.09e-12 ***
## poly(inbh, 3)2 -5.261e-01 2.626e-01 -2.003 0.046000 *
## poly(inbh, 3)3 6.315e-01 2.514e-01 2.512 0.012483 *
## poly(vis, 2)1 -1.287e+00 3.042e-01 -4.230 3.06e-05 ***
## poly(vis, 2)2 9.621e-01 2.631e-01 3.657 0.000298 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2486 on 319 degrees of freedom
## Multiple R-squared: 0.7779, Adjusted R-squared: 0.7709
## F-statistic: 111.7 on 10 and 319 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(oz.poly3)
par(mfrow=c(1,1))
residualPlots(oz.poly3)
## Test stat Pr(>|t|)
## poly(day, 3) NA NA
## hum2 -1.715 0.087
## safb 1.408 0.160
## poly(inbh, 3) NA NA
## poly(vis, 2) NA NA
## Tukey test 1.249 0.211
oz.poly.step = step(oz.poly3)
## Start: AIC=-907.89
## upoz^0.333 ~ poly(day, 3) + hum2 + safb + poly(inbh, 3) + poly(vis,
## 2)
##
## Df Sum of Sq RSS AIC
## - hum2 1 0.0182 19.730 -909.59
## <none> 19.712 -907.89
## - poly(vis, 2) 2 1.7204 21.433 -884.28
## - poly(inbh, 3) 3 3.5529 23.265 -859.20
## - poly(day, 3) 3 4.5215 24.234 -845.74
## - safb 1 6.4337 26.146 -816.68
##
## Step: AIC=-909.59
## upoz^0.333 ~ poly(day, 3) + safb + poly(inbh, 3) + poly(vis,
## 2)
##
## Df Sum of Sq RSS AIC
## <none> 19.730 -909.59
## - poly(vis, 2) 2 2.1790 21.910 -879.02
## - poly(inbh, 3) 3 3.6498 23.380 -859.58
## - poly(day, 3) 3 5.4419 25.172 -835.20
## - safb 1 6.4993 26.230 -817.63
summary(oz.poly.step)
##
## Call:
## lm(formula = upoz^0.333 ~ poly(day, 3) + safb + poly(inbh, 3) +
## poly(vis, 2), data = Ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.81107 -0.16265 0.01113 0.16413 0.65740
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.106261 0.102909 10.750 < 2e-16 ***
## poly(day, 3)1 -0.903432 0.277495 -3.256 0.001252 **
## poly(day, 3)2 -2.446519 0.345127 -7.089 8.69e-12 ***
## poly(day, 3)3 0.487751 0.273491 1.783 0.075464 .
## safb 0.016957 0.001652 10.267 < 2e-16 ***
## poly(inbh, 3)1 -2.336157 0.325503 -7.177 5.00e-12 ***
## poly(inbh, 3)2 -0.552420 0.257824 -2.143 0.032897 *
## poly(inbh, 3)3 0.638250 0.250788 2.545 0.011397 *
## poly(vis, 2)1 -1.343946 0.285044 -4.715 3.62e-06 ***
## poly(vis, 2)2 0.993020 0.256543 3.871 0.000132 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2483 on 320 degrees of freedom
## Multiple R-squared: 0.7777, Adjusted R-squared: 0.7714
## F-statistic: 124.4 on 9 and 320 DF, p-value: < 2.2e-16
The residual plots and curvature tests suggest no model violation deficiencies. To further refine our model we again perform stepwise selection, which leads to the humidity squared term being dropped further simplifying our model.
This example does not lead us to the BEST model, but it is adequate.
Remember all models are WRONG but some are useful!